Optimal Cluster Preserving Embedding of Nonmetric Proximity Data
نویسندگان
چکیده
For several major applications of data analysis, objects are often not represented as feature vectors in a vector space, but rather by a matrix gathering pairwise proximities. Such pairwise data often violates metricity and, therefore, cannot be naturally embedded in a vector space. Concerning the problem of unsupervised structure detection or clustering, in this paper, a new embedding method for pairwise data into Euclidean vector spaces is introduced. We show that all clustering methods, which are invariant under additive shifts of the pairwise proximities, can be reformulated as grouping problems in Euclidian spaces. The most prominent property of this constant shift embedding framework is the complete preservation of the cluster structure in the embedding space. Restating pairwise clustering problems in vector spaces has several important consequences, such as the statistical description of the clusters by way of cluster prototypes, the generic extension of the grouping procedure to a discriminative prediction rule, and the applicability of standard preprocessing methods like denoising or dimensionality reduction.
منابع مشابه
Distributional Scaling: An Algorithm for Structure-Preserving Embedding of Metric and Nonmetric Spaces
We present a novel approach for embedding general metric and nonmetric spaces into lowdimensional Euclidean spaces. As opposed to traditional multidimensional scaling techniques, which minimize the distortion of pairwise distances, our embedding algorithm seeks a low-dimensional representation of the data that preserves the structure (geometry) of the original data. The algorithm uses a hybrid ...
متن کاملIndefinite Proximity Learning: A Review
Efficient learning of a data analysis task strongly depends on the data representation. Most methods rely on (symmetric) similarity or dissimilarity representations by means of metric inner products or distances, providing easy access to powerful mathematical formalisms like kernel or branch-and-bound approaches. Similarities and dissimilarities are, however, often naturally obtained by nonmetr...
متن کاملVisualizing Graphs as Maps with Contiguous Regions
Relational datasets, which include clustering information, can be visualized with tools such as BubbleSets, LineSets, SOM, and GMap. The countries in SOM-based and GMap-based visualizations are fragmented, i.e., they are represented by several disconnected regions. While BubbleSets and LineSets have contiguous regions, these regions may overlap, even when the input clustering is non-overlapping...
متن کاملPairwise Data Clustering by Deterministic Annealing - Pattern Analysis and Machine Intelligence, IEEE Transactions on
Partitioning a data set and extracting hidden structure from the data arises in different application areas of pattern recognition, speech and image processing. Pairwise data clustering is a combinatorial optimization method for data grouping which extracts hidden structure from proximity data. We describe a deterministic annealing approach to pairwise clustering which shares the robustness pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IEEE Trans. Pattern Anal. Mach. Intell.
دوره 25 شماره
صفحات -
تاریخ انتشار 2003